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Abstract 

This paper studies the problem of nonparametric testing for the effect of a ran- 
dom functional covariate on a real-valued error term. The covariate takes values in 
L 2 [0,1], the Hilbert space of the square-integrable real- valued functions on the unit 
interval. The error term could be directly observed as a response or estimated from 
a functional parametric model, like for instance the functional linear regression. Our 
test is based on the remark that checking the no-effect of the functional covariate is 
equivalent to checking the nullity of the conditional expectation of the error term 
given a sufficiently rich set of projections of the covariate. Such projections could 
be on elements of norm 1 from finite-dimension subspaces of L 2 [0, 1]. Next, the idea 
is to search a finite-dimension element of norm 1 that is, in some sense, the least 
favorable for the null hypothesis. Finally, it remains to perform a nonparametric 
check of the nullity of the conditional expectation of the error term given the scalar 
product between the covariate and the selected least favorable direction. For such 
finite-dimension search and nonparametric check we use a kernel-based approach. 
As a result, our test statistic is a quadratic form based on univariate kernel smooth- 
ing and the asymptotic critical values are given by the standard normal law. The 
test is able to detect nonparametric alternatives, including the polynomial ones. 
The error term could present heteroscedasticity of unknown form. We do no require 
the law of the covariate X to be known. The test could be implemented quite eas- 
ily and performs well in simulations and real data applications. We illustrate the 
performance of our test for checking the functional linear regression model. 
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1 Introduction 



Consider a sample of independent copies (Ui,Xi), ■ ■ ■ , (U n ,X n ) of (17, X) where U is a 
real- valued random variable and X is a square-integrable random function defined on the 
unit interval. The problem we investigate herein is the test of the hypothesis 

H : E(U\X) = almost surely (a.s.) (1.1) 

against the nonparametric alternative P[E({7|X) =0] < 1. We consider two cases: (a) 
U is directly observed; and (b) U is not observed and is estimated as a residual of a 
parametric model for functional covariates and scalar responses. 

There has been substantial recent work on the theoretical study of the functional data 
analysis. The monographs of Ramsay and Silverman (2002, 2005) and Ferraty (2011) 
provide a comprehensive landscape of the importance of the statistical methods for func- 
tional data. Estimation and prediction with functional covariates received substantial 
attention in the literature: for example by Ferraty and Vieu (2006), Cai and Hall (2006), 
Hall and Horowitz (2007), Crambes, Kneip and Sarda (2008), Yao and Miiller (2010) and 
the references therein. 

The goodness-of-fit problem we address seems to be much less explored. There is a 
large literature on model checks like fll.l[) against nonparametric alternatives when X 
takes values in a finite-dimension space, see for instance Hardle and Mammen (1993), 
Stute (1997), Horowitz and Spokoiny (2001), Guerre and Lavergne (2005). In the case of 
functional covariate X, much little work was accomplished for testing against general types 
of alternatives. To our best knowledge, the only contribution considering the problem of 
testing Hq against nonparametric alternatives in the cases (a) and (b) is the recent paper 
of Delsol, Ferraty and Vieu (2011) who extend the idea of Hardle and Mammen (1993) 
to the functional covariate case. However, their results are derived under some strong 
assumptions, like for instance the assumptions on the rates of convergence of the so-called 
small ball probabilities and the law of the covariate X that are supposed to be known. 
It is not clear how the test of Delsol, Ferraty and Vieu (2011) could be easily applied in 
practice, for instance for testing the goodness-of-fit of the functional linear model. Some 
more substantial work was done for testing for no effect in a functional linear model, see 
Cardot, Ferraty, Mas and Sarda (2003), Cardot, Goia and Sarda (2007), or for testing the 
functional linear model against quadratic alternatives, see Horvath and Reeder (2011). 
By construction, such procedures are not able to detect general departures from the null 
hypothesis. 

The test we introduce herein is based on a dimension reduction idea used by Lavergne 
and Patilea (2008) in a finite dimension setup. Our test is able to detect nonparametric 
alternatives, including the polynomial ones. The variable U could be heteroscedastic and 
we do not require the conditional variance of U given X to be known. We do no require 
the law of the covariate X to be given or to be of a certain type, like for instance Gaussian. 
The test could be implemented quite easily and performs well in simulations and real data 
applications. 
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The paper is organized as follows. In section |2] we introduce the main notation and 
we derive a fundamental lemma for our approach. This lemma shows that checking 
condition ( II. ip is equivalent to checking the nullity of the conditional expectation of U 
given a sufficiently rich set of projections of X on elements of norm 1 from finite-dimension 
subspaces of L 2 [0, 1]. Next, the idea is to search in finite-dimension subspaces of L 2 [0, 1] a 
least favorable element of norm 1 and to check the nullity of the conditional expectation 
of U given the scalar product between X and the selected least favorable direction. In 
section [3] we introduce the test statistic for testing of no-effect of X on U when U is 
observed. Our statistic is a quadratic form, based on univariate kernel smoothing, that 
behaves like a standard normal random variable under H Q . We prove that, under mild 
integrability or boundedness assumptions, the induced test is consistent against any type 
of fixed alternatives and against sequences of directional alternatives approaching the null 
hypothesis at a suitable rate. The allowed rates are almost the same as those obtained 
in parametric model checks based on kernel smoothing with univariate covariate, see 
for instance Guerre and Lavergne (2005) or Lavergne and Patilea (2008). In section 
S] we apply our projection-based approach for nonparametric checks of the functional 
regression models. We will focus on the linear functional model, although, at the expense 
of longer arguments, the methodology we propose also adapts to other models, like for 
instance the generalized functional linear models introduced by Miiller and Stadtmiiller 
(2005). In the functional regression case the variable U is the unobserved error term of the 
regression model and hence the test statistic is based on the estimated residuals. We still 
obtain standard normal critical values and consistency against nonparametric alternatives, 
fixed or approaching the null hypothesis. However, more restrictive conditions on the 
bandwidths are required due to the estimation of the slope of the functional linear model. 
This induces restrictions on the rate the directional alternatives may approach the null 
hypothesis. More difficult the estimation of the slope parameter is, slower the rate the 
directional alternative approach the null hypothesis should be. For estimating the slope 
parameter in the functional linear regression model we will focus on the standard approach 
based on functional principal component analysis. In section [5] an empirical study is 
reported. First, a wild bootstrap procedure is proposed as a means to approximate the 
critical values of the test statistic. Then, the results of a simulation study are briefly 
explained. The conclusion is that the test works well in practice. Under the null, the level 
is quite well respected and the power is more than acceptable even in the comparison 
with parametric tests. The proposed test is consistent under general alternatives. Some 
advices and comments are provided about the choice of the parameters involved in the 
new test. The test is applied to test the goodness-of-fit of the functional linear model 
and the functional quadratic model for the Tecator data set. Both models are rejected 
which indicates that more flexible models should be considered, like for instance the 
semiparametric index models introduced by Chen, Hall and Miiller (2011). The proofs of 
our theoretical results are relegated to the appendix. 
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2 Dimension reduction in nonparametric testing 



Let us introduce some notation. For any p > 1, let S p = {7 G M p : ||7|| = 1} denote 
the unit hypersphere in MP. Let L 2 [0, 1] be the space of the square-integrable real- valued 
functions defined on the unit interval (•, •) denote the inner product in L 2 [0, 1], that is for 
anyX u X 2 EL 2 [0, 1] 

(X h X 2 ) = [ Xx(t)X 2 (t)dt. 
Jo 

Let || • 1 1 £,2 be the associated norm. Hereafter 1Z = {pi,p 2 , • ■ • } will be an arbitrarily fixed 
orthonormal basis of the function space L 2 [0, 1], that is (puPj) = Sy. Then the predictor 
process X can be expanded into 

00 

X(t) = jT l x j p J (t), (2.2) 
i=i 

where the random coefficients Xj are given by Xj = (X,pj). For a fixed positive integer 
p, X^ G L 2 [0, 1] will be the projection of X on the subspace generated by the first p 
elements of the basis 1Z, that is 

Let us notice that HX^H^ coincides with the Euclidean norm of the vector (xi, • • • , x p ) 
in M p . By abuse we also identify X^ with the p— dimension random vector (37, • ■ ■ , x p ). 
On the other hand, for any integer p > 1 and non random vector 7 = (71, ■ • • , 7 P ) G MP, 
we consider by abuse 7 an element in L 2 [0, 1] with (71, ■ ■ • , 7 P , 0, 0, ■ • • ) the coefficients of 
its expansion and hence (X, 7) = (J^j) = Y^i=i x j^j- ^ ne following we will also use 
f3 = YlJLi bjPjit) to denote a non random element of L 2 [0, 1]. 

Our approach relies on the following lemma, an extension of Lemma 2.1 of Lavergne 
and Patilea (2008) and Theorem 1 in Bierens (1990) to Hilbert space- valued conditioning 
random variables. The result shows that for checking nullity of a conditional expectation, 
it is equivalent to consider expectations conditional on X and expectations conditional 
on L 2 [0, 1] projections of X on a sufficiently rich set of directions. 

Lemma 2.1 Let X G L 2 [0, 1] and Z G R be random variables. Assume that K\Z\ < 00 
and E(Z) = 0. 

(A) The following statements are equivalent: 

1. E(Z \X) =0 a.s. 

2. E(Z I (X,P)) =0 a.s. V/3 G L 2 [0, 1] with ||/3|| L 2 = 1. 

3. for any integer p > 1, E(Z | (X, 7)) = a.s. V7 G S p . 
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4- for any integer p > 1, E(Z | X^) = a.s. 

(B) Suppose in addition that for any positive real number s, 

E(|Z|exp{s||X||}) < oo. (2.3) 

If P[E(Z | X) = 0] < 1, then there exists a positive integer po > 1 such that for any 
integer p > p , the set 

{7 G <S P : E(Z | (X, 7 )) = a.s.} 
has Lebesgue measure zero on the unit hypersphere S p and is not dense. 

Point (A) is a cornerstone for proving the behavior of our test under the null and the 
alternative hypothesis. Point (B) shows that in applications it will not be difficult to find 
directions 7 able to reveal the failure of the null hypothesis (11.11) . Under the additional 
assumption (12.31) such directions represent almost all the points on the unit hyperspheres 
iS p , provided p is sufficiently large. The assumption (12. 3p is not restrictive for testing 
purposes. Indeed, if X does not satisfy condition (I2.3p . it suffices to transform X into 
some variable W G L 2 [0, 1] such that the a— field generated by W is the same as the one 
generated by X and the variable W satisfies condition (Qfl Clearly, when U is the error 
term in some functional regression model for which one wants to check the goodness-of-fit, 
one should use a transformation of X only after estimating the errors in the model. 

The following new formulations of H are direct consequences of Lemma [2. 11 (A). 

Corollary 2.2 Consider a real-valued random variable U such that E\U\ < 00. Let 
u)(j3,t), (3 G L 2 [0, 1] and i 6 K, be a real-valued function such that u((3, (X,j3)) > for 
a M \\P\\l 2 = !• F° r any P > I? let w p(l,t), 7 G MP and t G M, be a real-valued function 
such that iUp( 7 , (X, 7)) > for all || 7 || = 1. The following statements are equivalent: 

1. The null hypothesis U.l\) holds true. 

2. 

max E[UE(U\(X,f3))u(f3, (X, /?))]= 0. (2.4) 

/3ei 2 [0,l], 11/311^2=1 

3. for any p > 1 and any set B p C S p with strictly positive Lebesgue measure on the 
unit hypersphere S p , 

max E [UE (U\ (X, 7)) ^(7, (X, 7))] = 0. (2.5) 
76-Bp 

*For instance, given X = Y^,j>i x jPji one ma y build Wj — a,j arctan(:rj), where aj are non random 
such that Y^ij>i flj < 00 and may use the bounded random function W = X^>i w jPj ^ L 2 [Q, 1] (bounded 
means ||W|| is a bounded random variable) instead of X in the conditioning. 
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3 Testing the effect of a functional covariate 

We introduce a general approach for nonparametric testing of the effect of a functional 
covariate X on a real-valued random variable U. For simplicity, here we assume that 
E(i7) = 0, the nonzero mean case is contained in the setup considered in section H] below. 
Our approach is based on Corollary 12.21 (13]) and univariate kernel smoothing. In this way 
we avoid the problem of smoothing in infinite-dimension, in particular we avoid using the 
small ball function required in the kernel regression with functional covariates, see Ferraty 
and Vieu (2006), Delsol, Ferraty and Vieu (2011). 

To avoid handling denominators close to zero, we set the weight function a; (7, •) in 
Corollary 12.21 equal to the density of (X, 7), denoted by / 7 (-), which is assumed to exist 
for any 7. For any 7 G W, let 



For any p > 1, let B p C S p be a set with strictly positive Lebesgue measure in S p . By 
Corollary I2.2[ the null hypothesis (11 .ip holds true if and only if 



3.1 The test statistic 

In view of equation ( 13. ip . our goal is to estimate Qf^y). With at hand a sample of (U, X), 
define 



where Kh (•) = K (-/h), where K(-) is a kernel and h a bandwidth. In the case of finite 
dimension covariates, the function 7 1— y Q n {l) is the statistic considered by Lavergne and 
Patilea (2008), see also Bierens (1990). For fixed p and 7 G S p , it is well-known that Q n (7) 
has asymptotic centered normal distribution with rate nh 1 ^ 2 under H ; see for instance 
Guerre and Lavergne (2005). We will show that the asymptotic normal distribution is 
preserved even when p grows at a suitable rate with the sample size. On the other hand, 
Lemma 12.11 (B) indicates that if p is large enough, the maximum of Q (7) over 7 stays 
away from zero when H fails. 

For a fixed p the statistic Q n {l) is expected to be close to Q(^l) uniformly in 7. Then 
a natural idea would be to build a test statistic using the maximum of Q n {l) with respect 
to 7. However, there is an additional difficulty one faces in the functional data framework 
since then one has to let p to grow to infinity with the sample size, and hence the closeness 
between Q n {l) and Qij) requires a more careful investigation. On the other hand, like 
in the finite dimension covariate case, under H one expects Q n {l) to converges to zero 
for any p and 7 and thus the objective function of the maximization problem to be flat. 

We will choose a direction 7 as the least favorable direction for the null hypothesis Hq 
obtained from a penalized criterion based on a standardized version of Q n (7). Lavergne 



Q( 7 ) = E{[/E[[/| (X,7>]/ 7 ((X, 7 »} = E{E 2 [f/| (X, 7 )]/ 7 «X, 7})}. 



Vp > 1, max (5(7) = 0. 



(3.1) 
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and Patilea (2008) and Bierens (1990) considered this idea using Q n (7). Here we use 
a standardized version of Q n {l)- More precisely, fix some (3 e L 2 [0, 1] that could be 
interpreted as an initial guess of an unfavorable direction for Hq. Let boj, j > 1, be 
the coefficients in the expansion of flo in the basis TZ. For any given p > 1 such that 
ELi bl > 0, let 



j=l u 0j 

(p) 



i 



Let v 2 (-) be an estimate of the variance of nh^ 2 Q n (-). Given B p C S p with strictly 
positive Lebesgue measure in S p that contains Jq\ the least favorable direction 7 for H 
is defined as 

(3.2) 



7 n = argmax 



n/l 1/2 Qn(7)/^n(7) - OrJ/^W j 



where 1U is the indicator function of a set A, and a n ,, n > 1 is a sequence of positive 
real numbers increasing to infinity at an appropriate rate that depends on the sample 
size and the rates of h and p and that will be made explicit below. Let us notice that 
the maximization used to define j n G S p is a finite dimension optimization problem. The 
choice of /?o, and thus of 7^, is theoretically irrelevant, it does not affect the asymptotic 
critical values and the consistency results. However, in practice the choice of Pq could 
be related to a priori information of the practitioner on a class of alternatives, like for 
instance the class of functions depending only on (X, /3 ). The empirical investigation we 
report in section [5] suggests that working with a standardized version of Q n (7) simplifies 
the choice of a n in applications. 

We will prove that with suitable rates of increase for a n and p and decrease for h, the 
probability of the event = 7^} tends to 1 under H . Hence Q„(7„) /v n (l) behaves 
asymptotically like Qn(7o P ^)/^ri(7o^); even when p grows with the sample size. Therefore 
the test statistic we consider is 

Tn = nh l ^^4. (3.3) 

M7n) 

We will show that an asymptotic a-level test is given by I (T n > zi_ a ), where z\_ a is the 
(1 — a)-th quantile of the standard normal distribution. 



3.2 Estimating the variance 

To find the direction 7 n and to build the test statistics (13. 3p . we need to estimate in some 
way the variance of nh l / 2 Q n {^). The approach that is expected not to inflate the variance 
estimate under the alternatives and thus to guarantee better power small finite samples 
would involve the estimations of the conditional variance of nh^QM given Xj's which 
writes 

< (7) = ^Tuh £ °l(xPy p (xV)Kl ((X t - X j} 7)) , (3.4) 
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where a^{X^) = Var[U \ X^]. An estimator can be easily obtained by replacing <r 2 (-) 
with an estimate in the last expression. In theory, a good solution would be to use a 
nonparametric estimate of the p— variate function cr 2 (-), but this is practically infeasible 
given that it expected to let p to grow with the sample size. A simple and convenient 
solution with high- dimension covariates is then 

rl (7) = ? 2 7TT £ UfVjK* m - X in )) . (3.5) 
n(n - l ft z — ' J 

v ; 3+i 

Since, under the null hypothesis 7 n = Jq with probability tending to 1, a first variance 
estimator we propose is 

vl(%) = v 2 n (%, To b) ) = min (r 2 ^),? 2 ^)) . (3.6) 

On the other hand, let us notice that t 2 ^^) — E[t 2 (7q P ^)] is expected to converge to 
zero. Moreover, under the null hypothesis 

E[r n 2 ( 7o b) )] = nnr 2 n (< P) ) I (Xi,7o W >, • • • , <^,7S P) >]] 



E 



2 y 

n(n - l)h ^ 



VarlU, | (X i)7o W >] Var[U, \ (X^^K 2 [{X.-X^)] 



and < cr 2 < Var[U \ (X,^^)} < a 2 < oo. Next, notice that the conditional variance 
of U given (X, /3 ) is the same under H and under any alternative that depends only on 
(X,0o). Finally, notice that in any case E(U 2 ) > E[Var(U \ (X,/3 ))}. All these facts 
suggest that a compromise for estimating the variance of nh l ' 2 Q n (^ ] ) would be 

vl = v 2 n {^) = ^tttE 5% {{X„ 7o W » 1^))K 2 ((X t - Xj , 7o W >) , (3.7) 

77,(71 — 1)11'—? To To V / 

where a 2 {p) (•) is some nonparametric estimate of the univariate function cr 2 (p) (t) = Var(U | 

To To 

(A, 7q P ^) = £) satisfying the condition 



sup 

Ki<n 



^((XiA p) )) 

To 



^ (P) ((^,7? ) )) 

To 



(3i 



Different nonparametric estimators can be used, for instance a kernel estimator like in 
Lavergne and Patilea (2008). We will prove below that both variance estimators ( 13. 6ft 
and (13.71) guarantee the standard normal asymptotic critical values and consistency of 
our test. In simulations, better power under the alternative was obtained when using the 
variance estimator ( 13.71) . The drawback of this estimator is the computational cost and 
the choice of an additional bandwidth for the estimate a 2 (p) (•). However, common choices 

To 

of this bandwidth work well in practice. 
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3.3 Behavior under the null hypothesis 

Let us introduce a first set of assumptions. Below P G MP denotes the null vector of 
dimension p. Moreover, JF[-] denotes the Fourier transform, cf. Rudin (1987). 

Assumption D 

(a) The random vectors (Ui,Xi), . . . , (U n ,X n ) are independent draws from the random 
vector (U,X) G M x L 2 [0, 1] that satisfies E\U\ m < oo for some m > 11. 

(b) 3 a 2 and a 2 such that < a 2 < Var(U \ X) < ~d 2 < oo almost surely. 

(c) The sets B p C S p , p > 1 appearing in A3.2\) are such that: 

(i) there exist constants C\,S > (independent of n and p) such thatVp > 1 and 
V7 G B p! the variable (X, 7) admits a density / 7 (-) and 

cr 1 < / {/ 7 2 i(/ < i) + f^Kf > i)} < c i; 

(ii) there exists C 2 ,e > such that f^ <e \ J~[f 7 ] \ 2 (x)dx > C 2 , Wp > 1, V7 G B p ; 
(Hi) the initial 'guess' (3q satisfies the condition: 3C*3 such that f ( P ) < C%, Vp > 1. 

To 

fz'vj i?p x 0p'_ p C B p ,, \Jl <p < p' . 
Assumption K 

(a) The kernel K is a continuous density of bounded variation with strictly positive 
Fourier transform on the real line. 

(b) /i — ^ and {nh 2 ) a / In n — > 00 for some a G (0,1). 

(c) p> 1 increases to infinity withn and there exists a constant A > such that p\nT x n 
is bounded. 

Let us comment on these assumptions. The bounded variation of K, in particular this 
means K is bounded, is a very mild condition that allows to easily bound covering numbers 
of families of functions indexed by 7. Continuity and bounded variation guarantee that K 
can be recovered by inverse Fourier transform. The role of technical assumption of positive 
Fourier, that is satisfied by triangular, normal, logistic, Student, or Laplace densities, will 
be explained below. In Assumption K]-(c), it is also possible to let p to grow with the 
sample size at a polynomial rate, instead of the logarithmic rate. However, we will see 
below that, in theory, this could induce a loss of power for our test. There is a trade off 
between the moment conditions one imposes for U and the range of rates allowed for the 
bandwidth and the growth rate for p : higher moments will be needed for wider ranges 
and faster rates for p. For bandwidths and p satisfying Assumption [K]-(b,c) it suffices 
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to take m > 11 in Assumption [D]-(a); see the proof of Lemma 13.11 Let us notice that 
Assumption H3-(b) implies that Vp > 1, < a 2 < E(U 2 | X®) < a 2 < oo almost surely. 
Finally, let us comment on Assumption [D]-(c). On one hand, a key issue in the proof 
of Lemma 13.11 below and some of the subsequent proofs will be to control the rate of 
W\h~ l Kh((Xi — A 2 ,7))] uniformly in 7 G B p as p grows and h decreases with the sample 
size. To reduce technicalities, we choose the convenient solution that consists in trying to 
bound this quantity by a constant. Using the Fourier transform and Plancherel theorem, 
this is guaranteed by a condition like j R f 2 < C±, V7 G B p . In the proofs for the functional 
linear model we have to strengthen this condition and add J R f 2+s I(f > 1) < C±, V7 G B p , 
for some arbitrary small 5 > 0. Such sufficient conditions could be easily achieved for 
instance if the coefficients Xj of the expansion of X are independent. Then it suffices to 
fix some k > 1 such that the density of Xk is bounded and some small c independent 
of p and to take B p = {(71, • • • , 7fc, • ' ' >7j>) £ S p '■ \lk\ > c }- This simple idea could 
be useful in many other cases than the one of independent coefficients Xj. On the other 
hand, we have to keep the variance estimate in the denominator of the test statistic (13.31) 
away from zero. For this we have to ensure that E,[h~ 1 K 2 ((Xi — X 2 , 7))] is bounded away 
from zero uniformly in 7 G B p as p grows and h decreases with the sample size. One easy 
way to ensure this is to use again the Fourier transform properties, the positiveness of 
^[K] and to impose the positive uniform lower bound for the integral of square of .F[/ 7 ] 
in a neighborhood of the origin, which necessarily induces a uniform lower bound for 
J R f 2 . Assumptions [D]- (c) (iii) will complete the sufficient conditions for deriving standard 
normal critical values using the central limit theorem for [/—statistics of Guerre and 
Lavergne (2005, Lemma 2). To summarize, the choice of (3$ and B p will be decided in the 
applications and will also depend on the law of X and the choice of the basis 1Z. In view 
of our extensive simulation experiment, we argue that the choice of B p is not an issue 
in applications, one can confidently perform the optimization on the whole hypersphere 
S p . Finally, the condition B p x P '_ P C B p >, Vp < p', is a mild technical condition that 
combined with Lemma [2. 11 (A) greatly simplifies the proof of the consistency of our test. 

The first step is the study of the behavior of the process Q n (l), 7 £ B p , under H 
when p is allowed to increase with the sample size. 

Lemma 3.1 Under Assumptions^ and{Kl and if H holds true, 

sup |Q n ( 7 )| =0 P (n- 1 / i - 1 /y/ 2 lnn). 

Moreover, ifr 2 ^) is the estimate defined in equation A3.5\) . 

sup {l/f„ 2 ( 7 )} = Op(1). 

'y&BpCSP 



If in addition condition A3.8\) holds true, 1/v 2 = Op(l) with v 2 defined in (3.7). 



We now describe the behavior of 7 n under Hn. A suitable rate a n will make 7 n to be 
equal to 7 q with high probability. Under the null, a n has to grow to infinity sufficiently 
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fast to render the probability of the event {7^ = 7g } close to 1. We will see below that, 
for better detection of alternative hypothesis, a n should grow as slow as possible. Indeed, 
slower rates for a n will allow the selection of directions 7 n that could be better suited than 

(v) 

7o for revealing the departure from the null hypothesis. The rate of p is also involved 
in the search of a trade-off for the rate of a n : larger p renders slower the rate of uniform 
convergence to zero of Q n (j), 7 G B p , and hence requires larger ot n . 

Lemma 3.2 Under Assumptions [D[ \E\ and condition \3. 8\) if the variance estimator is 
the one defined in ^3. 7| ), for a positive sequence a n , n > 1 such that a n /{p 3 / 2 lnn} — > 00, 

P(7n = 7o°) -> 1, under H o- 

The following result shows that the asymptotic critical values of our test statistic are 
standard normal. 

Theorem 3.3 Under the conditions of Lemma \3.S\ and if the hypothesis Ho in U.l\) holds 
true, the test statistic T n converges in law to a standard normal. Consequently, the test 
given by I(T n > Zi_ ), with z a the (1 — a) — quantile of the standard normal distribution, 
has asymptotic level a. 



3.4 The behavior under the alternatives 

First let us give an intuition on the reason why our test is consistent. Consider the 
alternative hypothesis 

E x : P[E(C/ I X) = 0] < 1. 

The way the statistic T n is constructed guarantees the consistency of our test against Hi. 
Indeed, we can write 



T 



nh x ' 2 Q n {%) 



V n (ln) 

= max {n/i 1/2 Q n ( 7 )/£n(7) - «nl { ^ 7 (,) } } + oJf^Wj 
max 7gi?p nhV 2 QM nh^QM 

- — *m — a "-^rT" tt "' V7£B ' c5 ' (3 - 9) 

with v n (7Q P ^) equal to r n (7o P ^) defined in (13.61) or equal to v n defined in (13.71) . Since 
E(U 2 \X)>a 2 and Var(U \ (X, 7 J p) )) > a 2 , it is clear that l/v n (^ p) ) = O w (l) for both 
variance estimates introduced above. On the other hand, from Lemma I2.1[ there exists 
Po and 7 G B po such that the expectation of Q n (j) stays away from zero as the sample 
size grows to infinity and h decrease to zero. On the other hand, for any p > po and any 
n and h, clearly max 7e s p Q n (l) > Qn(j), because B po x P _ PO C B p . All these facts show 
why our test is omnibus, that is consistent against nonparametric alternatives, provided 
that p — > 00. 
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To formalize the consistency result, let us fix some real-valued function 5(X) such that 
E[£(X)] = and < E[5 4 (X)] < oo, and some sequence of real numbers r n that could 
decrease to zero (the case r n = 1 is also included). Consider the sequence of alternatives 

H ln : U = U° + r n 8(X), n > 1, with E(U° \ X) = 0. (3.10) 

We show below that such directional alternatives can be detected as soon as r\nh}l 2 j a n 
tends to infinity. This is exactly the same condition as in Lavergne and Patilea (2008). 
However, in the functional data framework, to obtain the convenient standard normal 
critical values, we need l/a n — o(p~ 3 ^ 2 In" 1 n). Hence, the rate r n at which the alternatives 
Hi n tend to the null hypothesis should satisfy r^nh 1 / 2 / {p 3 / 2 Inn} — > oo. 

Theorem 3.4 Suppose that 

(a) Assumption ID1 holds true with U replaced by U°; 

(b) Assumption\K\ is satisfied, and so is the condition A3.8\) if the variance estimator is 
the one defined in 7\ ); 

(c) a n /{p 3 / 2 lnn} — > oo and r n , n > 1 is such that r^nh 1 ! 2 j ct n — > oo; 

(d) E[5(X)] = and < E[<5 4 (X)] < oo. 

Then the test based on T n is consistent against the sequence of alternatives H\ n if there 
exists p > 1 and 7 G B p such that ¥{E[S(X) \ (X, 7)] = 0) < 1 and one of the following 
conditions is satisfied: 

1. the density fa is bounded; 

2. the function K[S(X) \ (X, 7) = -]fa(-) is bounded; 

3. the Fourier transform ofE[5(X) \ (X, 7) = -]fa{-) is integrable on M. 

Let us recall that the existence of p and 7 G B p such that P[<5(X) | (X, 7)] = 0) < 1 is 
guaranteed by Lemma 12.11 

4 Testing the goodness-of-fit of parametric models 

Here we apply our projection-based testing methodology for testing the goodness-of-fit of 
the functional linear regression model against nonparametric alternatives satisfying mild 
technical conditions. Hence we provide a simple goodness-of-fit procedure for a widely 
used model. To the best of our knowledge, our results are completely new in the functional 
regression framework. 
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Let U be a real-valued random variable and X be a random variable with values in 
L 2 [0, 1]. The model we want to test is the functional linear model defined by 

Y = a+(b,X) + U, 

where b e L 2 [0, 1] and oGK are unknown parameters. The null hypothesis is 

H : E(U\X) = a.s. (4.11) 

In order to formally establish consistency against nonparametric alternatives, we will 
consider a sequence of local alternatives like in (13.1Qp . 

Like in Assumption ID1 we consider that (Ui,Xi), • • • , (U n ,X n ) are independent copies 
of (U,X), but now the observations are (Y~i,Xi), ■ ■ ■ ,(Y n ,X n ). Hence the error term 
U has to be estimated in a preliminary step from the estimates of the parameters a 
and b. We will investigate the behavior of our test statistic under the null and under the 
alternatives for a generic estimate of the slope with suitable rate of convergence. Next, we 
will get into the details in the standard case of the slope estimate based on the functional 
principal component analysis. In particular, we will see how the difficulty of estimating 
the parameters in the functional regression model could perturb the properties of test. 
To keep the technical conditions readable, hereafter we will assume that E(|?7| m ) < oo 
for any m > 1. 



4.1 The test statistic and the behavior under the null hypothesis 

The test statistic is similar to the one we proposed for testing the effect of a functional 
covariate. Let /3 , 7o , <~> p an d B p be defined as in section [31 Let b G L 2 [0,1] denote a 
generic estimator of the slope b and let 

a = Y n - [ b{t)X n {t)dt = a- [ {b(t)-b(t)}X n (t)dt + U n , 
Jo Jo 

where U n = n' 1 J^ILi Let Ui = Y{ — a — (b, Xj) be the residuals and let 

Q n ( T ,a,b)= 1 V UiU^KhiiXi-X^y)), 1 eS?, 
n[n — 1 ^-^ h 

where recall K(-) is a kernel, h a bandwidth and (•) = K (•//&). Let w 2 (-;a, b) be an 
estimate of the variance of nh l ' 2 Q n {-)'a, b) like in section l3~2l Given B p C S p with strictly 
positive Lebesgue measure in S p that contains 7^, the least favorable direction 7 for H 
is defined as 

7 n = argmax 

■yeB p 



nh 1/2 Q n (>y; a,b)/v n (r, a,b) - aJ-(^( P n 



(4.12) 
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The test statistic is then ^ 

T n = nh^ Qn{% ^3 . (4.13) 
v n (%;a,b) 

We will show that an asymptotic a-level test is given by I(T n > Zi_ a ), where z a is the 
(1 — a)— quantile of the standard normal distribution. 

To derive the standard normal behavior of the test statistic under the null, we will 
show that under suitable conditions 

sup nh 1/2 \Q n (>y;a J b)-Q n (>y)\ = o P (l) and sup \v n (r, a, b)/v n (j) - 1| = op(l), (4.14) 

-yeSP -yeSP 

with Q n {l) and Vnil) defined like in section [31 that is we will bring the problem back to 
the case where the errors f/j are observed. 

Lemma 4.1 Assume the conditions of Theorem \3.3\ are met, K(\U\ m ) < 00 for any m > 
1, and f*E[X 2 (t)]dt < 00. Let b G L 2 [0, 1] be an estimator of b such that \\b — b\\ L 2 = 
Op(n _p ) for some 3/8 < p < 1/2. Moreover, suppose that the bandwidth h is such that 
n i-2C^i/2 q j Qr some 3/8 < C < p. Then, under the hypothesis H the uniform rates of 
convergence in d4-14j) holds true. 

At this stage it is worthwhile to notice an important difference between the functional 
data framework and the finite-dimension case. In the later case the parameters of a 
parametric regression model could be easily estimated at the usual rate Op(n -1 / 2 ) which 
makes that the equivalences f)4.14p to hold without any further conditions on the model. 
In the functional covariate and functional parameter case, the rate of ||6 — 6|| depends on 
the regularities of the covariate and of the slope parameter and is in general less than 
Op(n _1//2 ), see Hall and Horowitz (2007), Crambes, Kneip and Sarda (2009). To make the 
differences U{ — Ui sufficiently small and hence to preserve the standard normal critical 
values for T n defined in (14.131) one has to pay a price on the bandwidth h: slower rates of 
|| 6 — £>||x,2 will require faster decreases for h, and this will result in a loss of power against 
sequences of local alternatives. Below, we will investigate these aspects in more detail in 
the case where the slope is estimated using the functional principal component approach. 



Theorem 4.2 Under the conditions of Lemma \4-l\ and if the hypothesis Hq holds true, 



the law of the test statistic T n is asymptotically standard normal. Consequently the test 
given by I(T n > Zi_ a ), where z a is the (1 — a) — quantile of the standard normal distribution, 
has asymptotic level a. 

The proof of this theorem is a direct consequence of Lemma 14.11 and the arguments 
we used for Theorem I3.3[ therefore we will omit it. 

There are several possibilities to estimate the parameters of a functional linear model. 
Let us investigate our test in the case where the estimate b is obtained using the functional 
principal component analysis (PCA) approach, which is a standard approach for estimat- 
ing the slope b; see for instance Ramsay and Silverman (2005) and Hall and Horowitz 
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(2007). For the sake of completeness, let us briefly recall this estimation procedure. Let 
K(u,v) = Cov(X(u),X(v)), X n = n~ l YJl=i X i and 

n 

K(u,v) = Y^{Xi{u)-X n {u)){Xi{v)-X n {v)). 

i=l 

Write the spectral expansions of /C and /C as 

oo oo 
3=1 j=l 

where 

e 1 > e 2 > ■ ■ ■ > o, 0i > e 2 > ■ ■ ■ > o 

are the eigenvalues sequences of the operators with kernel K and /C, respectively, and 
0i, 02, • • • and 0i, 02, . . . are the respective orthonormal eigenfunctions sequences. The 
linear operator corresponding to K is defined by (/C/)(w) = f K,(u,v)f(v)dv. We have 
Kb = g where g(u) = E[(Y - E(Y))(X(u) - E(X(u))). This suggests the estimator 

m 

&(*) = te[0,l}, (4.15) 

3=1 

where the truncation point m is a smoothing parameter, bj = 9j = <j>j) and 

n 

9it) = n- 1 - Y n)(Mt) - X n {t)) (4.16) 

i=l 

with Y n = n~ l Yh=x Y i- 

For simplicity, hereafter the orthonormal basis 1Z introduced in section [2] is the basis 
composed of the sequence of orthonormal eigenfunctions 0i, 2 , . . . of the covariance op- 
erator /C. Hence X(t) = ^2JLi x j4'j{'t)^ where the random coefficients Xj = (X,(j)j). The 
following assumptions are standard conditions on the covariance operator K and the slope 
parameter in the linear model, as could be found in Hall and Horowitz (2007). 

Assumption P 

(a) The covariate X has finite fourth moment, that is J* E[X 4 (t)]dt < oo; moreover for 
some constant C > 1, E[xj — E(xj)] A < Cd? for all j. 

(b) The errors Ui are identically distributed, independent of Xi, with zero mean and 
finite variance. 

(c) The eigenvalues 9j of the covariance operator K satisfy 

e 3 - e j+1 > c- 1 ^- 1 vj>i. 
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(d) The Fourier coefficients bj satisfy 



\hj\<Cj- p 

and a > 1, |a + 2 < j3. 

The condition |a + 2 < (3 replaces condition |a + 1 < (3 of Hall and Horowitz (2007) 
in order to conciliate the various requirements on the bandwidth h. For comparison, see 
also Theorem 4.1 of Cai and Hall (2006) where is required (3 > a + 2. Hall and Horowitz 
(2007) show if Assumption [Pi hold true and if m x n 1 ^ a+2 ^ , then 

-~ / _ 2/3-1 \ 

||6-6||2a = P (n J , 

and this rate is optimal in a minimax sense. In this case p = (2/3 — l)/{2(a + 2(3)} 
and the condition p > 3/8 of Theorem 14.21 which guarantees a non empty range for the 
bandwidth, becomes |a + 2 < f3, that is Assumption [P]-(d). 

4.2 The behavior under the alternatives 

The alternatives of the functional linear model we consider are of the form 

H ln : Y in = a + (6, X { ) + r n 5(Xi) + n> 1, with E(U? | X,) = 0, 1 < z < n, 

(4.17) 

with £(■) an real-valued function such that < E[5 4 (X)] < oo and r n , n > 1 a sequence 
of real numbers. 

To be able to investigate the behavior of the test statistic under the alternatives, first 
we have to analyze the behavior of b the estimator of b. To keep the paper at a reasonable 
length, hereafter we consider that b is estimator obtained through that functional PCA 
approach. In Lemma I4T31 below we derive the rate of convergence of b towards b under the 
alternatives Hi n , provided that the function 5(-) satisfies the orthogonality conditions 

E[S(X)} = and E[S(X)X] = 0. (4.18) 

Such orthogonality conditions are quite common in nonparametric testing, see for instance 
equation (3.11) in Guerre and Lavergne (2005), and they allow to focus on the performance 
of the test to detect departures from the model. 

Lemma 4.3 Assume that Xi, . . . , X n are independent draws from X , ¥\X 2 {t)]dt < oo 

and condition hold true. Let b (resp. b°) be the estimator defined in l[4-15\ ) obtained 

from data generated according to the model fl^.iTp with a bounded sequence r n > 0, n > 1 
(resp. with r n = for all n > 1 ). Then 

m 

\\b°-Hl 2 =O,(r 2 n n-i)J20j 2 . 
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If in addition assumption^ hold true and m x n 1 ^ a+2 ^ , then 

{b(t) - b{t)} 2 dt = P (n-%w) + o ¥ {r 2 n ). 



o 



Let us note that no moment condition for U° is needed for the proof of the first 
part of Lemma 14.31 Moreover, let us point out that we will not need to investigate the 
convergence rate for the estimator of a under the alternatives since by construction 



a - a = - / (b(t) - b{t)}X n {t)dt + U n . 
Jo 



Now, we can analyze the behavior of the test statistics under the alternatives (j4.17p . The 
estimated residuals Ui can be decomposed 



Ui = V? + rJiXi) - (b - b, X t - X n ) - rJ(X) n - U\ (4.19) 

Theorem 4.4 Consider the sequence of alternative hypotheses ( [^.17| j with a nonzero 
function 5 satisfying and < E[5 4 (X)] < oo. Let b G L 2 [0,1] be an estimator 

of the slope parameter b. Suppose that the conditions of Theorem \4-^\ are met with U re- 
placed by U° . Moreover, assume that b, the sequence r n , n > 1, the sequence a n , n > 1 
and the bandwidth h satisfy the additional conditions: 

(i) r 2 n nh}l 2 1 a n — > oo; 

(ii) r^Wb - b\\ L 2 = op(l); 

(Hi) a n /{p 3 / 2 Inn} — > oo. 

Then the test based on T n defined in Ij4-13\ ) w ^ reject the functional linear regression 
model with probability tending to 1, provided there exists p > 1 and 7 G B p such that at 



least one of conditions (1) to (3) of Theorem\3.J\ holds true 



If Assumption [P] holds true, condition (ii) of Theorem 14.41 indicates that the test 
could detect only local alternatives Hi n that approach the null hypothesis slower than 
n -(2/3-i)/{2(a+2/3)}_ Meanwhile, in order to detect the fastest possible alternatives, the 
bandwidth should decrease to zero as slow as allowed by condition (i), that is faster 
than n _2 ( a+1 )/( a + 2 / 3 ) times a power of Inn, provided the dimension p and a n increase as 
fast as a power of Inn such that condition (iii) is met. 
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5 Empirical analysis 



5.1 Bootstrap procedures 

To improve the critical values of the test statistic T n with small samples we consider a 
wild bootstrap procedure that can be applied in both cases we consider herein: the C/j's 
are observed or the U's are estimated by some f/j's. A bootstrap sample is denoted by 
U\ or Ui, 1 < i < n. The wild bootstrap procedure we propose is inspired by Mammen 
(1993): U\ = ZiUi (resp. V\ = zfy, 1 < i < n, with Z< = K/V2+ (V? - l)/2 and 
Vi independent standard normal variables independent from the original observations. A 
bootstrap test statistic is built from a bootstrap sample as was the original test statistic. 
When this scheme is repeated many times, the bootstrap critical value z*_ a n at level a is 
the empirical (1 — a)— th quantile of the bootstrapped test statistics. This critical value 
is then compared to the initial test statistic. 



5.2 Simulation study 

An extensive simulation study was carried out. For reasons of brevity, only a summary of 
the main results and conclusions is given here. We will focus on the goodness-of-fit test 
of parametric models. 

Let us start with the functional linear model, as considered in Section 4, given by 

Y = a+{b,X) + U 

The function X is drawn as a Brownian motion, b G L 2 [0, 1] and aGK are parameters to 
be estimated, and U = S(X) + U°, where S(X) is the deviation from the null hypothesis, 
and U° is the error. For the parameters, b(t) = 1 for all t G [0, 1] and a = were taken 
as the true values. 

A sample (Y%, X±), . . . (Y n , X n ) of size n = 200 will be drawn from this model, that is, 

Y l = a+ (b,X i ) + 5{X l ) + U^ l<i<n, 

where U®, . . . , U® are independent standard normal variables, also independent of the Xi. 

The first scenario will be the goodness-of-fit of the functional linear model versus a 
quadratic type deviation 

■ 1 pi 



S Q (X) = c(j^ X{s)X(t)dsdt- 1/3 



where c = under the null hypothesis and c = 0.6 under the alternative. The second 
scenario will be the goodness-of-fit of the functional linear model versus a cubic deviation 

5 c (X) = d( [ [ [ X(s)X(t)X(z)dsdtdz- f X(t)dt] (5.20) 
\Jo Jo Jo Jo J 
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where d = under the null and d = 0.9 under the alternative. Note that the two functions 
Sq(-) and S c (-) satisfy the orthogonality conditions (14.181) . In these two scenarios, the PCA 
estimator of the functional linear model, studied in Hall and Horowitz (2007), is used. 
Let us recall that the Karhunen-Loeve expansion of the Brownian motion X, is given 

by 

OO j 

j=l w • I 

where Xj are independent standard normal coefficients, 1Z = {pj{t) = v / 2sin((j — 0.5)nt) : 
j G {1,2, . . .}} constitutes an orthonormal basis of eigenfunctions, and l/((j — 0.5) 2 7r 2 ) 
are eigenvalues. We made use of this basis TZ, and took different values of p, the number 
of basic elements. Other basis were also checked. The role played by the basis and the 
dimension p consists of allowing to approximate both the covariate function X and the 
alternative. A good basis is that which provides a good approximation with a small 
dimension p. The Karhunen-Loeve basis is obviously a good basis to approximate the 
covariate function. 

Several possible choices were studied for the privileged direction, 7q P ' ) . Here we present 
results with an uninformative one, with the same coefficients in all basic elements. 

Different values for the penalization were considered. Since the statistic is standardized 
before penalization, natural values for a n are 3, 4, 5 or 6. Small values of the penaliza- 
tion provide results that are similar to those obtained with the direction maximizing the 
standardized statistic, that is, argmax 7g s P nh^Qn^/vnd), while larger values of the 
penalization lead to results similar to those obtained with the chosen direction 7q P \ The 
results presented here correspond to the penalization a n = 5. 

To compute the statistic for each direction, we used the Epanechnikov kernel, K(x) = 
1 — x 2 for x G [—1,1]. A grid of bandwidths was used in order to explore the effect of the 
bandwidth on the power of the test. 

To estimate the conditional variance, the two estimators (13. 5 p and (13. 7p were consid- 
ered. For the estimator (13 .7p . a kernel estimator of the errors' conditional variance was 
used, with uniform kernel and bandwidth h v = 0.5n -1 / 6 . We observed a better power 
under the alternative with the estimator (13. 7p . so the results will be given with this esti- 
mator. 

For the optimization in the hypersphere S p , a grid of 300 points was used in the case of 
p = 3 dimensions, and a grid of 1280 points in the case of p = 5 dimensions. Aditionally, 
a local refinement of the optimum was used, with 9 points in dimension p = 3 and 81 
points in dimension p = 5. For each original sample, we used 199 bootstrap samples to 
compute the critical value. One thousand original samples were used to approximate the 
percentages of rejection. The results shown in figures below were obtained with dimension 
p = 3. 

Figure [T] shows the empirical powers obtained for a grid of values of the bandwidth 
both under the null hypothesis of the functional linear and under the quadratic alternative. 
We observe that the power is not very much affected by the bandwidth around a possibly 
optimal value. 
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For purposes of comparison, the empirical power of the Horvath and Reeder (2011)'s 
test (HR test for brevity) is also shown. These authors proposed a test of significance of 
the quadratic effect under a functional quadratic model. Note that HR test is specially 
designed to detect quadratic alternatives to the linear model, as the one proposed here 
as the first scenario. As expected, HR test is more powerful than our test, specially for 
dimension 1. This dimension represents the number of components in the estimation of 
the functional linear model, which in the case of the HR test coincides with the dimension 
used in the test statistic to estimate the quadratic deviation. HR test loses power when 
the dimension increases as a consequence of a bigger noise in the test statistic. 

For our test, m = 3 was used for the estimation of the functional linear model and 
p = 3 was taken for the number of basic elements. Simulations were also carried out 
with other values of m and p, and we observed that our test provides similar power when 
increasing each of these dimensions. The reason is that our test is not very much affected 
by the noise coming from increasing dimension, and this allows for a bigger dimension 
and consequently a better approximation of the deviation. It could be said that our 
test reaches a better trade-off between the noise coming from dimensionality and the 
approximation of the linear model and the alternative. 



o 



o 

CO 





_ -A- - - 






--A- 






A---- 






-Q-X--- 




























- A- 

- X- 

-e- 


HR test Dim 1 
HR test Dim 3 
Our test c=0.6 
Our test c=0 






-<y — 


" — » 


------0-- 


CT 


0" 








ii 



0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 
Bandwidth 



Figure 1: Testing the functional linear model versus a quadratic alternative. 



Figure [2] shows the power of our test under the second scenario, where a functional 
linear model is tested versus a cubic alternative. As expected, HR test is not very powerful 
since it was not designed to detect this kind of alternative. The power of our test is good 
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in a wide range of values of the bandwidth and the level is very well respected under the 
null hypothesis. 
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Figure 2: Testing the functional linear model versus a cubic alternative. 



As an illustration of the behavior of our test for the goodness-of-fit of a more general 
parametric model, we considered the goodness-of-fit of the quadratic functional model, 
and obtained percentages of rejection under the null hypothesis and under the cubic 
alternative. That is, the simulated model would be 

Y = a+ [ b{t)X{t)dt+ [ [ h(s,t)X(s)X(t)dsdt + 8(X) + U° 
Jo Jo Jo 

which consists of a quadratic functional model, as considered in Yao and Miiller (2010) 
and Horvath and Reeder (2011), plus a deviation represented by the function 5(-). Here 
b(t) = 1 for all t G [0, 1], and h(s,t) = 0.6 for all s,t G [0, 1], which are the same linear 
and quadratic effects considered before. The deviation was chosen to be 5 = 5 C , that is, 
the cubic deviation already considered in ( I5.20p . Then, the idea will be to carry out a 
goodness-of-fit of the functional quadratic model, and evaluate its performance under the 
null d = and under the cubic alternative d = 0.9. Results are shown in Figure [31 To 
the best of our knowledge there is no parametric test for comparison in this situation. 

The results show a certain conservative behavior for large bandwidths, while the power 
is generally good, with no much effect coming from the bandwidths. Results were obtained 
for different values of p and the dimension of the parametric estimator, with generally 
good and expectable outcomes. Further investigation will be provided elsewhere. 
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Figure 3: Testing the functional quadratic model versus a cubic alternative. 



5.3 Application to real data 

The test proposed here is applied to the data set collected by Tecator and available at 
http: / /lib.stat.cmu.edu/datasets/tecator| The task is to predict the fat content of a meat 
sample on the basis of its near infrared absorbance spectrum. For each sample of finely 
chopped pure meat, a 100 channel spectrum of absorbances was recorded using a Tecator 
Infratec Food and Feed Analyzer, a spectrometer that works in the wavelength range 
850-1050 nm. These absorbances can be thought of as a discrete approximation to the 
continuous record, -X»(t). Also, for each sample of meat, the fat content, Yj, was measured 
by analytic chemistry. The data set contains 240 samples of meat. 

Yao and Miiller (2010) proposed using a functional quadratic model to predict the 
fat content, Yj, of a meat based on its absorbance spectrum, Xi(t). Horvath and Reeder 
(2011) applied their parametric test to check whether the quadratic term is needed, versus 
the null hypothesis of a functional linear model. Their reached the conclusion that the 
quadratic effect is significant, and then, the functional quadratic model is needed. 

We will apply the test proposed here, first to check the goodness-of-fit of the functional 
linear model, and later the goodness-of-fit of the functional quadratic model. Table 1 
below contains the p- values corresponding to our test for different values of the bandwidth, 
the parameter m for model estimation and the dimension p. We can conclude that both the 
functional linear and the functional quadratic models should be rejected for the Tecator 
data set. This conclusion confirms the empirical results of Chen, Hall and Miiller (2011) 
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who proposed an additive double index model. Indeed, the link functions estimated 
by Chen, Hall and Miiller do not show respective linear and quadratic patterns which 
indicates that the usual functional linear and the functional quadratic models do not 
provide a satisfactory fit. 

Linear model Quadratic model 







h 




0.18 


0.30 


0.44 


0.59 


0.18 


0.30 


0.44 


0.59 


p = 


2 


m = 


1 


0.5 


0.4 


0.2 


0.6 


2.4 


1.4 


1.6 


3.3 






m = 


2 


0.2 


0.0 


0.0 


0.3 


0.6 


0.3 


0.0 


0.7 






m = 


3 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


p = 


3 


m = 


1 


0.0 


0.0 


0.2 


0.2 


0.0 


0.1 


0.1 


0.0 






m = 


2 


0.0 


0.0 


0.0 


0.1 


0.2 


0.0 


0.1 


0.0 






m = 


3 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 



Table 1. p- values (in percentages) obtained by applying the new test to the Tecator data 
set. 
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6 Appendix 



6.1 Dimension reduction: proof of the fundamental lemma 

Proof of Lemma 12.11 (A). The implication (1) (2) is obvious. To prove (2) =>- (1), 
note first that for any (3 ^ 0, the a— field generated by (X,(3) is the same as the a— field 
generated by (X, f3) /\\f3\\ L 2. Next, by elementary properties of the conditional expectation, 
we obtain that for any (3 G £ 2 [0, 1], including (3 = 0, 

= E[exp{i(X,P)}E(Z \(X,P))] 
= E[exp{i(X,f3)}Z] 

= E[exp{i(X,f3}}E(Z | X)} . (6.1) 

Write Z = Z + — Z~ where Z + and Z~ are the positive and negative parts of Z, and deduce 
that for any (3, E [exp{i(X, (3)}E(Z + | X)] = E[exp{i(X, (3)}E(Z~ | X)]. As distinct 
positive finite measures cannot have the same characteristic function, see for instance 
Theorem 3.1 of Parthasarathy (1967), this implies that E{Z + \ X) = E(Z~ | X) and 
hence E(Z | X) = almost surely. For (2) =>• (3) it suffices to identify 7 with an element 
in L 2 [0, 1] of norm 1. To prove (3) =^ (1), fix arbitrarily (3 G L 2 [0,1], f3 7^ 0. For any 
p > 1, let (3^ be the projection of (3 on the subspace generated by the first p elements 
of the basis 1Z. For any p sufficiently large such that \\f3^\\ = \\(3^\\ L 2 > we have 
E(Z I (X,(3^)) = E{Z I (X,(3^/\\(3^\\)) = 0, where for the last equality we use the fact 
that /^Vll/^ll £ S p and (3). By elementary properties of the conditional expectation, 

= E [exp{i(X, (3 (p) )}E(Z \ (X,[3 {p) ))] 
= E [exp{i(X,/3)}Zexp{i(X,(3W - /?)}] 

= E [exp{i(X,/3}}E(Z | X)[exp{i{X, (3 {p) - /?)} - 1]] 
+E [exp{i{X,/3)}E(Z \ X)} . 

From the Taylor expansion with integral reminder and elementary calculus, one obtains 
that Vx G M, I exp(ix) — 1| < min{|x|,2}. From this and the Cauchy-Schwarz inequality, 
deduce that for any p, 

|exp{z(X,/?W - 13)} - 1| < min{||X|| L2 ||/3^ -/3|| L2 ,2}. 

Since \\(3^ — /3||l 2 - > when p — > 00, by Lebesgue Dominated Convergence Theorem it 
follows that necessarily 

E[exp{i(X,/3)}E(Z | X)] = 0. 

Since (3 G L 2 [0, 1] was arbitrarily fixed, apply again the arguments we used after equation 
(16. ip to deduce that (1) hold true. The equivalence (3) <^ (4) follows from Lemma 
2.1-(A) of Lavergne and Patilea (2008) applied for each p. 

(B). From (A)-4 above, there exists some p > 1 such that F[E(Z | X^) = 0] < 1. 
On the other hand, by the property of iterated expectations, for any p > po, 

E(Z I X {po) ) = E[E(Z I X {p) ) I X (po) ] . 
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Thus necessarily P[E(Z | 1^) = 0] < 1, Wp > po. Fix arbitrarily p > po and notice that 
for any b G [—1, 1], 

{7 G S p : E(Z I (X, 7 )) = a.s.} C {7 G 5 P : E(Z exp{6(X, 7)}) = 0}. 

The expectations in the sets in the last display are well-defined since 

E(|Zexp{6(X, 7 >}|) < E(|Z| exp{|6| | (X, 7) |}) < E(|Z| exp{||X||}) < 00. 

Let us notice that 

{67 : b G [-1,1], 7 e S p , E(Zexp{6(X, 7 )}) = 0} C l p 

where 

A p := {7 G M p : ||t1 < 1, E(Z exp{(X^, 7)}) = 0}. 

Thus, to prove (B) it suffice to show that the set A p has Lebesgue measure zero in MP -1 
and is not dense in the unit ball of K p_1 [j] 

For these purpose, we will use the following property: if W\ and W 2 are real-valued 
random variables such that E(|Wi| exp{a|W2|}) < 00 for some a > 1, then 

P(E(Wi I W 2 ) = 0) < 1 the set {|6| < a : E(W X exp{6W 2 }) = 0} is empty or finite. 

(6.2) 

To prove this property, decompose W\ = W± — W 1 and use the positive part to 
define 

\ + (b) = E(W? exp{bW 2 }) = E(E(W+ | W 2 )exp{bW 2 }) = [ exp{bw}dfi + (w), 

\b\ < a, where dfi + (w) = E(W / ' 1 + | W 2 = w)dFw 2 (w) and Fy/ 2 is the probability distribution 
function of W 2 . Use the negative part of W\ to define \~(b), \b\ < a similarly. Since 
Wi is integrable, /i~(M), /x+(R) < 00. The functions A - (■)///" (R) and A + (-)//i + (M) are 
the Laplace transforms of the probability distributions and The 

condition E(|Wi| exp{a| W2I}) < 00 implies that these Laplace transforms, and hence 
A + (-) and A~(-), are (real) analytic on the domain (—a, a). See for instance Proposition 
8.4.4 in Chow and Teicher (1997). Notice that the set in flBT2|) is the set of b G (-a, a) 
for which A - (6) = A + (6). If F(E(W 1 | W 2 ) = 0) < 1, A+(-) and A~(-) cannot coincide on 
(—a, a). Thus the set in (16.21) contains only isolated points b from the interval (—a, a), 
which means that it is necessarily empty or finite. 

Now, recall that we want to investigate the cardinality of the A p , subset of the unit 
ball of W. From (A), there exists 7 G S p such that P(E(Z | (A^j)) = 0) < 1. Then, 
property f)6.2p applied with some a > 1, W\ = Z and W 2 = (X^ p \j) implies that the set 

{|6| < a : E(Zexp{(X (p) ,67)}) = 0} 

^An easy way to check that it is indeed sufficient to derive such properties for A p is to represent the 
sets in the hyperspherical coordinates. 
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is empty or finite. Deduce that there exists v* in the unit ball of R p , arbitrarily close to 
the origin, in particular with \\v*\\ < 1/2, such that ~E(Z exp{{X( p \v*)}) 7^ 0. Next, we 
adapt the lines of the proof of Lemma 1 in Bierens (1990). Let Z* = Z exp{(X^ p \ v*)}. 
By construction, ¥{E(Z* \ x±, ■ ■ ■ , x{) = 0) < 1, for I = 1, ■ • ■ ,p. Define the sets 

At = ■ ■ ■ , t z ) e M l : II (tx, • • ■ , t{) II < 3/2, E(Z* exp{(a; 1 t 1 + ■ ■ ■ + atfi)}) = 0}, 

I = l,--- ,p. Since |tiXi + (X^>,v*)\ < {|ti| + |H|}||X^|| < 2||X( p )||, deduce from 
property (16. 2p applied with a = 2, W\ = Z and W 2 = exp{tiXx + (X^ p \ v*)} that the set 
A\ is empty or finite. Now, define the set 

Af{ti) = {\t 2 \ < 3/2 : E(Z*exp{a; 1 t 1 }exp{x 2 t 2 }) = 0}. 

If ti A*, replace Z* by Z*exp{xiti} and use again property (16. 2p with a = 7/2, W\ = Z 
and W 2 = expjtiXi + t 2 x 2 + (X^ p \v*)} to deduce that the set A%*(ti) is empty or finite. 
This means that A\ is contained in the union of some sets B'xl and R x B" where B 1 
and B" are empty or finite. Repeat the arguments with I — 3, • • • ,p and deduce that A* 
has Lebesgue measure zero in MP. Since the norm of v* could be taken arbitrarily small 
such that A p C A*, we can now easily deduce that A p has Lebesgue measure zero in the 

unit ball of M p . The fact that A p is not dense in the unit ball of R p is a direct consequence 
of the fact that A* intersected with unit ball of MP is not dense. ■ 

6.2 Rates of convergence: technical lemmas 

For v a probability measure on a sample space, J 7 a class of functions and e > 0, let 
N(e, J 7 , L 2 (v)), denote the covering number, that is the minimal number of balls of radius 
e in L 2 {y) needed to cover J- '. See Van der Vaart and Wellner (1996) or Kosorok (2008) for 
the definitions. For real random variables, A n Xp B n means that there exists a constant 
C > 1 such that P(l/C < A n /B n < C) goes to 1 when n grows. In the following 
C, C\, c, ci, • • • represent constants that may change from line to line. 

Lemma 6.1 For any p > 1, let 

Ti P = {(vi,v 2 ) h-> K{h^{ Vl - u 2>7 » : v u v 2 G R p , 7 G S p ,h> 0} 

and 

jr 2p = { v 1 — y E[K(h^(X - v,j)) \: v E M p , 7 £ S p , h > 0}. 

If Assumption]^ (a) holds, there exist constants ci, 02,03 > such that for any p > 1 and 
< s < 1 and an?/ 17 probability measure on M p x R p and i/ 2 probability measure on W p , 

N{e,F jp ,L 2 {uj)) < Cl (c 2 /er P , j = 1,2. (6.3) 

Proof. Since K can be written as a difference of two monotone functions, the result for 
T\ p is an easy consequence of the Theorem 9.3, Lemmas 9.6 and 9.9 of Kosorok (2008) 
and Lemma 16 of Nolan and Pollard (1987); see also their Lemma 22- (ii). For T 2v , use 
the bound for T\ v and Lemma 20 of Nolan and Pollard (1987). ■ 
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Lemma 6.2 Let Assumptions^ and\K\ hold true and let I be some strictly positive integer. 
For each n and p that may depend on n, define the U —processes 

y B (fa,fcO( 7 .Q = * U^K l h ((x t -X vl )), 1 eB p , h,k 2 e{0,2}. 

Then 

sup|V;( o > o )( 7 ;0|x P l, su P {1/|V;( 2 ' 2 )(7;0I} = Op(1) and sup \V™(r, l)\ = 0,(1). 

Proof. To simplify the writings, we write (resp. vjl ) instead of Vn°'°^ (resp. Vn )■ 
First consider the case k\ = hi = 0. Hoeffding's decomposition allows us to decompose the 
centered U— processes hVd (7; I) —'E[hVn°\'y; /)] as a sum of two degenerate [/—processes 
(7; Z) and (7; Z), 7 £ -Bp, of respective orders 1 and 2 that are indexed by families of 
functions obtained by finite sums of sets like T\ p and Tip in Lemma I6TT1 above. By Lemma 
16 of Nolan and Pollard (1987), deduce that the families indexing (7; Z) and (7; Z) 
are families with covering numbers bounded by polynomials in 1/e with coefficient and 
order depending on ci, C2 and C3 but independent of n and p. (When Z > 1, K should be 
replaced by K l in the definitions of T\ v and J~2 P , but given the properties of K(-) this has 
no impact on the conclusion.) Next, by Theorem 2 of Major (2006), sup 7gBp |V^( 7 ; l)\ = 
Of(n~ 1 h 1 / 2 p 3 / 2 Inn); see the proof of our Lemma [3.11 for an example of application of the 
result of Major (2006). On the other hand, by Theorem 2.14.1 or Theorem 2.14.9 of van 
der Vaart and Wellner (1996), we have sup 7&B |V^° (7; l)\ = Op(n~ 1 / 2 p 1 / 2 ). Gathering 

the rates and using; Assumption E-(b,c) we deduce that K (0) (7; I) - E[K (0) (7; I)] = o P (l), 
uniformly in 7 £ B p . Now, it remains to show that there exist constants c 1; C2 > such 
that ci < E[14 (0) ( 7 ; Z)] = Ef/r 1 ^ ((X 1 - X 2 , 7 ))] < c 2 , V7 £ B p and h sufficiently small. 
Using the properties of the Fourier and inverse Fourier transforms, Fubini theorem, the 
independence of X\ and X 2 and Plancherel theorem 

E[h~ 1 K l h ((X 1 -X 2ll })} = (27r)^/ 2 E / exp^(X 1 ,7)}exp{-^(X 2 , 7 )}^[A w ](t)rft 

Jr 

= (27T) 1 / 2 / \T[f,](t)\ 2 T[K l ](ht)dt 
Jr 

< (27T) 1 / 2 f \T[f 1 ]{t)\ 2 dt={2ixfl 2 [ f 2 (x)dx. (6.4) 
Jr Jr 

Assumption IDUc) (i) guarantees that E[/i _1 A^ ((X\ — A 2 ,7))] is uniformly bounded from 
above. On the other hand, using the positiveness of jF[K\ (hence of T\K ]), the fact that 
T\K ] is necessarily bounded away from zero on compact intervals, the previous display 
and Assumption [D]-(c)(ii), deduce that there exists constants c 3 and C4 such that Vp > 1, 
V7 £ B p and V7i < 1 (say), 

nh- x K l h ((X x - A 2 , 7 ))] >c 3 [ \T[f 7 ](t)\ 2 dt > c 4 > 0. 

J\t\<e 
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In the case h = k 2 = 2, by Assumption [13(b), E(V^\ T , 0) > Z^h^Kj, ((X 1 - X 2 ,j))], 
V7. Next use again Hoeffding's decomposition for Vn (7; I) — E(Vw (7; /)). The degenerate 
[/—statistics of order 1 and 2 can be treated with the same arguments as above. Deduce 
that 1/Vn (7; I) is uniformly bounded in probability. The case k\ — and k 2 = 2 could 
be handled with similar arguments. ■ 



Lemma 6.3 Under the conditions of Lemma \4.1\ for any e > 



sup 

7eSp, u 



1 n 

-y2UiK h {(x in )-t) 

n z — ' 

i=l 



O^p 1 ' 2 ^ 1 ' 2 ^ 1 ' 2 ). 



(6.5) 



Moreover, there exists a > stzc/i i/iai for any e > 0, 



sup 

■y&B p 



1 



n(n — l)h ^ 



UiKkHXi-Xj^)) 



O^n-^h- 1 ' 2 ^) 



(6.6) 



Proof. Let Q be a family of functions |gr| < 1 with covering number bounded by (K/e) v . 
With the notation of van der Vaart and Wellner (1996), let & n g, g G Q be the empirical 
process indexed by Q and let ||G n ||g = supg |G n gf|. From Theorem 2.14.16 of van der Vaart 
and Wellner, after tracing the constants in the proof, there exists C > independent of 
n and p such that for any 5 > 0, 3C$ > independent of n and p such that 



P(||G n || s >t)<C(^ 
1 a 



2\- 



t 

1 V - 
a 



3V+8 



cxp 



2a 2 + (3 + t)/v^ 



(6.7) 



for every t > and every sup^ Var(g) < a 2 < 1. Fix an arbitrary e > 0, use the covering 
number of T\ v in Lemma 16.11 and apply this inequality with g(U,X) = n~ t ^ 2 Ul{\U\ < 
n e / 2 )K h ((X, 7) — t), a — ch 1 / 2 and t = tp l l 2 n~ )i / 2 ln 1//2 n. Next derive the rate of the 
reminder n~ l Yli=i Ui{\U\ > n e ^ 2 )Kh((Xi, 7) — t) taking absolute values, recalling that 
E(|[/| m ) < 00 for every m > 1, and using Markov inequality. For the second quan- 
tity, apply the Hoeffding decomposition to the second order [/—statistics defined by 



h(Ui, Xi, Uj, Xj 



n 



e/2 UiI(\Ui\ < n e/2 )K h {{Xi-Xj^)). For the degenerate [/-statistics 



of order 2 multiply by h and proceed like in Lemma 13.11 To apply inequality (16.71) for 
the empirical process in Hoeffding decomposition we need a bound for the variance of the 
conditional expectation E [h~ l K h ((Xi — Xj,j)) \ X^}. Let 5 > such that f R f 2+s < C, 
V7 G B p , for some constant C. By a change of variables, the boundedness of K, Jensen 
inequality, and again a change of variables we have for some constant C, 



E{E 2 [h- 1 K h ((X i -X j , 1 ))\X i ]} = [ [ 

Jr Ur 



f y (u - th)K{t)dt 



f^ujdu 



< c'h- 2/{2+5) . 



2+<5 ( 

7 



II, 



\ 2/(2+<5) 

th)dt J / 7 (w)<iM 
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Use the covering number of Tip m Lemma 16.11 and inequality (16. 7ft to deduce 



sup 



-J2UiK\Ui\< n^ 2 )E [h~ l K h ( (Xi - X j , y) ) | X,] 



n 

Ki<n 



:0 P (p 1 /2/ i -l/(2+5) n -l/2+e/2 ln l/2 n) 



Use Markov inequality to bound n~ l J™ =1 Ui(\U\ > n*/ 2 )^- 1 K h ((Xi - X h 7)) | X f ] and 
hence complete the proof. ■ 

6.3 Testing for no-effect: proofs of the asymptotic results 

Let 

= ^^E<«(< X i^o W >)<w((%7o W >« ((Xi - X„7o W >) • (6.8) 



Lemma 6.4 Lei Assumptions G2. C2 and hypothesis Ho hold true. Then T 2 (y^] 
r n 2 ( 7 S P) ){l + op(1)} = t£(7o W ){l + op(1)}. Moreover, ^ = ^( 7 W){1 + o P (l)}, rift 
v 2 defined in A.3. 7| ), provided that condition A3. 8\) holds true. 

Proof. First let us notice that for any n and any Vu, Vu, 1 < i < n, a set of i.i.d. random 
variables with E(V^ + V 2 2 ) < 00 and 

there exists some constant C (independent of n) such that 

Var(A n ) < ^VariV^h^Kih-^Xi-X^y®))) 

< ^E[C?«JQ > 7o W »Cl«A i ,7o W »^ , ('»- 1 <^ " *i,7o W »] 

< ^E[C 1 2 ((X i , 7 S p) ))]E[C 2 2 ((X i ,7? ) ))] = ^E(^)E(^) (6.9) 



where (?{{X i} 7 W )) = E(V^f | (^,7^)), Z = 1,2. Since n/1 2 -> 00, we have Var(A n ) -> 0. 

Now, to check f 2 ( 7 S p) ) = r n 2 ( 7 f){l + o P (l)} take F H = ^ = £/ 2 . We have E[r 2 ( 7 J p) ) | 
X 1 ,---,X n ]=r n 2 ( 7 ? ) ) and 

E{r 2 ( 7 S p) ) - r 2 ( 7 ? } )} 2 = E{Var[^( 7 o W ) | X u ~- ,X n }} < Var(? 2 (y^)) -+ 0. (6.10) 



By the fact that Var{U \ X^) is bounded and bounded away from zero almost surely, 
and the fact that for / = 2 and I = 4, E[/i _1 A^ ((Xi — X2, 7})] is bounded and bounded 
away from zero Vp > 1, V7 £ 5 P and V7i < 1, deduce that the expectation of T 2 (y^) 
stays away from zero and infinity and its variance tends to zero. This together with (16.101) 
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allow to conclude that ? 2 (^ p) ) = t 2 (^ p) ){1 + o P (l)}. To obtain the same conclusion with 
r n(7o P ' ) ) replaced by ^(70) it suffices to consider above conditional expectations given 
(Xi, 7q P ' ) ), • • • , (X n , 70^)- The arguments for v 2 are similar and hence will be omitted. ■ 



Proof of Lemma 13.11 Let M > be a real number that depend on n in a way that 
will be specified later, define r]f = UiI(\Ui\ < M) — E(Z7 i I(|Z7 i j < M) \ x[ p) ) and consider 
the degenerate tZ-process 

Un~g = £ ^ " = ^ E ^* M > K i) 

defined by the functions g indexed by h and 7 G S p . By Assumption ID1 and iKl-(a). the 
arguments used in Lemma 16.11 above for the class T\ v , and Lemma 9.9-(vi) of Kosorok 
(2008), the bounded family T^ v = {9:76 S p ,h > 0} has a covering number like in 
( 16. 3p . By Theorem 2 of Major (2006) and its corollary, where we assume without loss of 
generality that < K(-) < 1, 



P( sup \U n g\> 

\-yeSP 



th 1 ' 2 In 



71 



-.M n M 
li 'J 

M M 



'> -KhUXi-X^-y)) 



> 



<C x C 2 exv\-C: 



th V2 p 3/2 ln 



n 



t/l l/2 p 3/2 ln n \ 

for any t > 



provided raa M > 



2 ^ th^p^lnn ^ r n _ nM3/2 , 2 



> C 4 [p + max (lnC 2 / Inn, 0)f /z ln ■ 



(6.11) 



where Ci, . . . C4 > are some constants independent on n, h and M and 

2 



(7 



m - sup E 

7€5p 



From Assumption |D]-(b,c) and using the arguments as in the last part of the proof of 

Lemma [6721 above, there is a constant C > independent of n such that C~ l < a 2 A M A /h < 

C. Take M 4 = nhp~ 3 ^ 2 ln _( - 1+<5 ^ n — > 00 with 5 > arbitrarily small. Hence cx^ is of order 
n -ip3/2 i n i+5 n q anc j f or an -y ^ > q 



o 7i/i 



C-y/ 2 ln 1+<5 n> 



th 1/2 p 3/2 ln n 



provided ri is large enough. On the other hand, for any constant C > 



(6.12) 



th^p^lnn > 1/2 3/2 > 3/2 
M 2 d„ - F - 



7i — > OO 



(6.13) 
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for any sufficiently large t. Since (Inn)" 1 ln(2/a"A/) is bounded by a positive constant as 
n goes to oo, Equations (16. 12ft and ( 16 .131) show that (16.111) is satisfied for our M, with n 
and t large enough. By Theorem 2 of Major (2006), U n g = ¥ (n^h^p 3 / 2 Inn) . 

Now, it remains to study the tails of C/j, that is we have to derive the orders of the 
remainder terms 

2R ln + R 2n = vfCjK, m - X j: 7 » + — * Yl t& K » «^ - ^i. 7» 



where & = ^ - ^ = ^I(|^| > M) - E [U$ (\Ui\ > M) \ X { ] . Now, E [sup 7 \R ln \] < 
CE(|^f| l^-l) < 2CE(\U i \)E(\£ j \) < C"E(|^|), and thus by Holder's and Chebyshev's 
inequalities 

E(|&|) < 2E[\U l \I(\U l \ > M)\ < 2E 1/m [\U l \ m ] F {m ~ 1)/m [\U t \ > M] < 2E[\U l \ m ] M x ~ m . 

Now it remains to choose m sufficiently large such that M 1 ~ m = o (n~ l h l l 2 p 3 / 2 Inn) . 
With Assumption [K]-(b) and our choice of M, m > 11 will be sufficient. Also it is clear 
that sup 7 \R2n\ is of smaller order than sup 7 \Ri n \. 

To prove that the inverse of the variance estimate is bounded in probability, in view of 
Lemma [6]H it remains to show that 1/7^(7), 7 G B p , is uniformly bounded in probability. 
For this recall that a^X^) > cr 2 and apply Lemma 16721 Now the proof is complete. ■ 

Proof of Lemma [331 By definition, nh l l 2 Q n {^ ] ) /v n (j^) < nh}l 2 Q n {%)/v n {%) - 
otnJ-iln 7^ 7o^))- This implies that 



< !(% + 7o b) ) < nh^a" 1 \ Q n tf n )/v n (ln) ~ Qnd^/U^) \ ■ 



From Lemmas 13. 1[ 16.21 and 16.4 



= ( 

Then a n p~ 3 ^ 2 / In n— >-oo yields I (7™ 7^7^ 



sup{l/r 2 ( 7 )}, l/vl 

76-B p 



SUp \Q n {l)\ 

76-B p 



< 2 max 

= P (n^h~ 1/2 p 3/2 lnn). 

op(1). Thus P(%^ 7 «) = E[I(7^ 7 S P) )]^0. 



Proof of Theorem 13.31 From Lemma [3.21 the probabilities of the events {Q n (ln) — 
Qndo^)} an d {^n(7") = ^(To^)}) with v 2 (-) defined in (13. 6p . both converge to 1. On 
the other hand, by Lemma EH above 7^(70 ) = 7^(70 ){1 + op(l)}. Moreover, v 2 = 
t 2 (7q P ^){1 + op(l)}, with v 2 defined in (13. Tl) . provided that condition ( 13.81) holds true. 
Hence it suffices to derive the asymptotic distribution of nh^-^Qn^^) / Tn^^) under H . 
For this purpose we use Assumption lDl-(c)(iii) and proceed like in Theorem 3.3 and Lemma 
6.2 of Patilea and Lavergne (2008); see also the CLT in Lemma 2 of Guerre and Lavergne 



31 



(2005). Moreover we use our Lemma l£T2l with k\ — hi — and / = 2. To be exactly in the 
case of Lavergne and Patilea (2008), first consider nh}l 2 Q n (7o )AVi(7c) ) w ^ n v n{lo^) 
defined in (16. 8p . The arguments for the asymptotic normality of nh 1 ^ 2 Q n {^Q^)/v n {^Q) 
are identical to those of Lavergne and Patilea and hence will be omitted. Finally, by 
Lemma E31 ^(7o ) = r n(7o ){1 + °p(1)} an d the stated result follows. ■ 

Proof of Theorem 13.41 The proof is based on inequality (I3.9p . Since E([/ 2 | X) > 
a 2 + r 2 J 2 {X), E{U \ X) = r n 8(X), and Var(U \ (X, 7 J p) » > a 2 + r 2 n Var{5(X) \ (X,^ p) )), 
clearly the variance estimate v n (7o^)) stays away from zero. Hence it suffices to look at 
the behavior of Q n (j). By Lemma EH]- (B) there exists p and j G B po C S po (p and 7 
independent of n) such that E[<5(X) | (X, 7}] 7^ 0. Since max 7eBp Q n {p() > Q n ^)) f° r an Y 
P > Po, it suffices to investigate the rate of Q n (^)- We can write 



Qn(l) 



l ——Y^ u i u iM{Xi-x j ^)) 

^^^(^((1,-1,7)) 



n(n 

2r, 

4 



n(n 



£ SiXiWXjKMXi - X v t» 



=: Qon(7) + 2r n Q ln ( 7 ) + <Q 2n ( 7 ). 

Since 7 is fixed (and of finite dimension), Qon(7) = Of^n" 1 /;," 1 / 2 ) (cf. proof of Theorem 
13. 31) . The £7— statistic Qi n (7) can be decomposed in a degenerate [/—statistic of order 2 
with the rate Of>(h~ l n~ l ) = Op(n~ 1 ^ 2 ) and the sum average of centered variables 



- U*-E[8{X j )h- 1 K h {{X i -X j ^))\Xl 



n 

Ki<n 



Hence it suffice to bound v 2 = E{(f/°) 2 E 2 [5(X j )/i" 1 ^((X i - X h 7)) | X f ]}. There are 
several set of assumptions on 5 and that could be used. Condition (i) implies that 
the map x t— > K[h~ 1 Kf l {(x — Xj,^))] is bounded. This combined with the bounded 
conditional variance of and the finite second order moment of S(Xj) yield v 2 < c 
for some constant c > 0. Similar arguments could be combined with the condition (ii) 
to obtain the boundedness of v 2 . Finally, if condition (iii) is met, let Vi = (Xi,j) and 
S(Vj) = K[8(Xj) I Vj}. Then using the inverse Fourier transform device we have 



E^X^h-'K^V, - V 3 ) \Xi] = E 



5{Vj) J exp{it(Vi - VjfiFlKlihtydt \ V, 
ex.p{itVi}F[5fa] (t)F[K] (ht)dt. 



Take absolute value in the last integral, use the fact that J- [6 fa] G -^ 1 (M), Lebesgue 
dominated convergence theorem and the fact that J-[K](ht) — > J-"[i^](0) as h — > to 
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deduce that K[5(Xj)h~ 1 Kh(Vi — Vj) \ Xj\ is bounded, and so is v 2 . Deduce that with 
any of the conditions (i) to (iii), Q\ n (^j) = Op(n -1 / 2 ). Finally, it is easy to show that 
Var[Q2n(n)] ~^ ( see 5 e.g., the proof of equation (26) in Lavergne and Patilea (2008)). It 
remain to study 

E[Q 2n (7)]= / \T[5f^]\ 2 (t)T[K](ht)dt. 

If condition (i) or (ii) holds true, 8 fa G L 2 (M) and by Plancherel theorem and Lebesgue 
dominated convergence theorem, E[Q 2 n(7)] — > j R \$fa\ 2 > 0. If condition (iii) is met, 
jF\8fa\ G L 2 (M) and since 8 fa G L 1 (M), deduce that 8 fa G L 2 (IR) and continue with the 
same arguments. Deduce that with any of the conditions (i) to (iii), Q2n(j) x Oip(1). 
Collecting the rates, we obtain the result. ■ 

6.4 Testing the functional linear model: proofs of the results 

To simplify notation, in this section we write || ■ || instead of || • ||x,2. 

Proof of Lemma 14.11 By simple calculations, we have Ui = Ui — (b — b, Xi — X n ) — U n . 

Let K h ^j{^f) be a short notation for K h ((Xj — Xj,j)) . We have the following decompo- 
sition 

Q n ( T , a, ft) = Q n (j) - 2V 1 (j) - 2V 2 { 1 ) + 1/3(7) + ^4(7) + 2V 5 (7) 

where 

vi= - ( Un Uh Yl ^^(7). v 2 = (b-b, 1 ^c^-xjm), 

V 3 = , l _ l)h J2(b ~ b, Xt - X n ) (b - b, X j - X n )K hjij (7) 

v< = E « w = ( s - b < ^hy h EW - W7) ) • 

To prove the rate in the first part of (14. 14ft we will show that 

sup nh lj/2 \Vj\ = op(l), 

7G5P 

for j — 1 to j — 5. First let us notice that by Fubini Theorem, E(||X„ — E(X)|| 2 ) = 
n- 1 J,, 1 Var(X(t))dt and so \\X n - E(X)|| = C^n" 1 / 2 ). 

For V\ use the fact that U n = Op(n -1 / 2 ) and apply Lemma 16.31 Thus there exists 
a > and < e < 2a(l - 2() such that 

sup nh 1 ' 2 ^ = n/ i 1 /2o p ( n -V2) 0p ( n -i/2+ ep i/2 /i -i/2 +a) = 0p(1)> 
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To derive the rate of V2 let us write 



-l— Ui{x s - nx)}K hM {i) \ 

i-i,y.-K(x)) 55r l I5S j:nw7) 



V21 ~ V22 



By Cauchy-Schwarz inequality, the rate of \\X n — E(X)|| and Lemma W3\ sup 7€iSP | V22 1 
Op (n -1 / 2 1| b — b\\) = Op(n~ 2p ). For the rate of V21 let us write 



n — 1 



n 



21 



1 V UiKufr) - V (6-6, I 3 -E(I) 



- 6-6 



KKn J l<i<n 

JT(0) 



n 2 /i 



KKn 



^211 — V212- 



By Cauchy-Schwarz inequality and the law of large numbers with |[/j|||Xj — E(X)||, V212 
n _1 /i _1 0p(l)0p(||6 — 6||). Next, by Cauchy-Schwarz inequality we can write 



where 



\V 21l \<h- l { sup \Z n (j,t)\ }\\b-b\\\\X n -E(X)\\, 



Z n (l,t) = - J2 UiK h {(X hl )-t) 
n z — ' 



KKn 



Apply Lemma [6.31 to deduce that there exists some small e > such that 

sup nh l/2 \V 2ll \ = n/i- 1/2 Op(/i 1/2 n- 1/2+ y /2 )Op(||6 - 6||)0 P (n- 1/2 ) = o P (l). 



Deduce that 

sup nh l l 2 \V2\ = op(l). 

For V3 take absolute values and use Cauchy-Schwarz inequality and triangle inequality: 
'|6-6|| 2 



\V,\ < 



n{n — l)h 



^{IIX-EMII+IIX^E^IIHII^-E^OII+II^-E^II}^^). 



Apply Lemma 16.21 three times and deduce that 



sup nh 1/2 \V 3 \ = nh l/2 F (\\b - 6|| 2 )0 P (1) = o P (l). 
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For V4 apply Lemma [6.21 with fci = 0, &2 = and I = 1 and the rate of U n to deduce 

sup nh 1/2 \V A \ = nh l/2 ¥ {n" l )0 ¥ {l) = o P (l). 

7G5P 

Finally, let us write 

v > = ^i)sE(?-^ 3 -- E w)^(7) 

=: V51 + V52. 

By Cauchy-Schwarz inequality and Lemma IB~2l with k\ = 0, hi — 2, I = 1 and £/ 2 replaced 
by |pO-EpO)||, 

sup n/i 1/2 |y 51 | = n/i 1/2 P (n- 1/2 )0 P (||6 - 6||)0 P (1) = n 1/2 h 1/2 P (\\b - b\\) = o P (l). 

Next, similar arguments for the uniform rate of V52. Deduce that 

sup nh l/2 \V 5 \ = op(l). 

The arguments for the rate in the second part of (14.141) are similar and hence will be 
omitted. ■ 

Proof of Lemma 14.31 Let g (resp. g°) be the random function defined in ( 14. 16ft that 
one would obtain under the null (resp. alternative) hypothesis, that is with covariates Xi 
and responses a + (b,Xi) + Uf (resp. a + (b,Xi) + S(Xi) + Uf). We can write 

mm m 

\\b°-b\\ 2 = E^-%) 2 = E^ 2 i^-^?i)i 2 ^E^-^°ii 2 ifeii 2 
j=i j=i j=i 



rl 



/l / 1 n \ 2 m m 

T^){I,W-I n («)} du J2°7 2 =■ ^n^f- 
\ n t=l / J=l j=l 



We have 
E 



„1 n 

= ~~ 2 / VE[J 2 (X 4 ){X t ( tt )-El(«)} 2 ]& 



1 ^ 



'/?. 



E[<5 2 (Xi){A"i(u) - EXi(u)} 2 ]d 
< -E 1/2 [5 4 (X)]E[||X-EX|| 2 ], 



n 
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where for the second equality we used the fact that E[5(X){X — EX}] = 0. On the other 
hand, since E[<5(X)] = 0, by the law of large numbers S(X) n = n' 1 YH=i = °p(1)- 

Recall that \\X n - E(X)\\ = ¥ {n- 1 ' 2 ). Deduce that 

/ (^J2 6 ( X ^{^( U ) du = 6{X) 2 n \\X n -E(X)\\ 2 = o ¥ (n~ 1 ), 

and finally that T n = Op(n _1 ). For the last part, use Theorem 1 of Hall and Horowitz 
(2007) which provides the rate of J Q {b°(u) — b(u)} 2 du. Next, let us recall that Assumption 
[P]-(c) implies 9j > cj~ a for some constant c and thus Y^JLi @J 2 = 0(n^ 2a+1 ^^ a+2 ^) = o(n) 
provided that m x n 1 ^ a+2/3 \ Finally, one can deduce from the equations (5.6) to (5.9) of 
Hall and Horowitz (2007) that Y."li( 9 J 2 ~ 9 J 2 ) 

= op(n). Now the proof is complete. ■ 

Proof of Theorem 14.41 Like in the proof of Theorem 13.41 it suffice to show that 
Q n {j) x P r 2 for some fixed p sufficiently large and 7 G B p , where 

Here Uj are defined as in (14.191) . There are 15 cross-product terms, all of them similar or 
identical to those analyzed in the proofs of Theorem 13.41 and Lemma 14.11 For the sake of 
brevity we omit the details. ■ 
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